[SPARK-27070] Fix performance bug in DefaultPartitionCoalescer by fitermay · Pull Request #23986 · apache/spark

fitermay · 2019-03-06T09:13:52Z

When trying to coalesce a UnionRDD of two large FileScanRDDs
(each with a few million partitions) into around 8k partitions
the driver can stall for over an hour.

Profiler shows that over 90% of the time is spent in TimSort
which is invoked by pickBin. This patch replaces sorting with a more
efficient min for the purpose of finding the least occupied
PartitionGroup

srowen · 2019-03-06T13:07:35Z

The indent is off here. Returning the result of a subtraction in compare can overflow, though I don't think it can happen in practice here. Still, see below, I think we can just remove this.

When trying to coalesce a UnionRDD of two large FileScanRDDs (each with a few million partitions) into around 8k partitions the driver can stall for over an hour. Profiler shows that over 90% of the time is spent in TimSort which is invoked by `pickBin`. This patch replaces sorting with a more efficient `min` for the purpose of finding the least occupied PartitionGroup

fitermay · 2019-03-06T14:10:04Z

@srowen Hi, Thanks for the prompt review. I've amended the code to address your comments.

SparkQA · 2019-03-06T19:58:13Z

Test build #4597 has finished for PR 23986 at commit 4ffb772.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

attilapiros · 2019-03-07T17:06:51Z

@fitermay It would be very nice to see some some rough numbers regarding the improvement caused by this PR. Could you please share us how it behaved before and after for your experiment mentioned in the description?

srowen · 2019-03-07T17:15:12Z

There's a little detail in the JIRA that's pretty suggestive that this is the bottleneck; if there's a stack trace or more numbers to show, that's great. Regardless I think this a clean 'win', just a question of how big.

fitermay · 2019-03-07T17:34:15Z

@srowen @attilapiros
After digging into this further I found out what exactly is happening and why sorting here causes a major issue.

It runs out EMRFS returns the string '*' as the host of each block. This ends up invoking the worst case of this algorithm where it tries to jam everything into the same preferred partition. In turn, this ends up running sort on hundreds thousands of records each iteration to find the minimum. I've contacted the EMR team to suggest changing the host to 'localhost' but apparently that would break MR performance on Yarn.

I still think this patch is a win because:

It's actually simpler and less code than the pre-patch code
There are lots of EMR users who would benefit from this until a strategic solution is found
It improves performance in less extreme cases as well

I will try to make the suggested changes and also generate some performance numbers for the extreme case tonight.

Thanks!

fitermay · 2019-03-08T12:46:48Z

Benchmark with 100K blocks instead of several million. Number of hosts = 1 is clearly the worst case

Coalesced RDD:                            Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Java HotSpot(TM) 64-Bit Server VM 1.8.0_112-b15 on Windows 10 10.0
Intel64 Family 6 Model 63 Stepping 2, GenuineIntel
Coalesced RDD:                            Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Coalesce Num Partitions: 100 Num Hosts: 1            492            520          33          0.2        4919.9       1.0X
Coalesce Num Partitions: 100 Num Hosts: 5            310            328          22          0.3        3103.2       1.6X
Coalesce Num Partitions: 100 Num Hosts: 10            247            267          19          0.4        2468.2       2.0X
Coalesce Num Partitions: 100 Num Hosts: 20            240            252          15          0.4        2399.7       2.1X
Coalesce Num Partitions: 100 Num Hosts: 40            229            244          16          0.4        2290.8       2.1X
Coalesce Num Partitions: 100 Num Hosts: 80            212            225          13          0.5        2123.6       2.3X
Coalesce Num Partitions: 500 Num Hosts: 1           1149           1177          26          0.1       11492.7       0.4X
Coalesce Num Partitions: 500 Num Hosts: 5            464            500          34          0.2        4643.8       1.1X
Coalesce Num Partitions: 500 Num Hosts: 10            386            397          19          0.3        3862.2       1.3X
Coalesce Num Partitions: 500 Num Hosts: 20            336            340           7          0.3        3358.1       1.5X
Coalesce Num Partitions: 500 Num Hosts: 40            269            283          17          0.4        2686.0       1.8X
Coalesce Num Partitions: 500 Num Hosts: 80            239            245           9          0.4        2391.0       2.1X
Coalesce Num Partitions: 1000 Num Hosts: 1           2213           2258          39          0.0       22131.2       0.2X
Coalesce Num Partitions: 1000 Num Hosts: 5            645            650           9          0.2        6448.8       0.8X
Coalesce Num Partitions: 1000 Num Hosts: 10            467            473           7          0.2        4673.8       1.1X
Coalesce Num Partitions: 1000 Num Hosts: 20            413            425          17          0.2        4133.7       1.2X
Coalesce Num Partitions: 1000 Num Hosts: 40            341            347          10          0.3        3412.4       1.4X
Coalesce Num Partitions: 1000 Num Hosts: 80            269            276          11          0.4        2688.8       1.8X
Coalesce Num Partitions: 5000 Num Hosts: 1          11048          11100          46          0.0      110484.2       0.0X
Coalesce Num Partitions: 5000 Num Hosts: 5           2396           2457          55          0.0       23959.0       0.2X
Coalesce Num Partitions: 5000 Num Hosts: 10           1390           1397           9          0.1       13899.1       0.4X
Coalesce Num Partitions: 5000 Num Hosts: 20            852            858           6          0.1        8516.9       0.6X
Coalesce Num Partitions: 5000 Num Hosts: 40            569            586          21          0.2        5692.7       0.9X
Coalesce Num Partitions: 5000 Num Hosts: 80            432            440           9          0.2        4322.7       1.1X
Coalesce Num Partitions: 10000 Num Hosts: 1          19685          19779          83          0.0      196853.8       0.0X
Coalesce Num Partitions: 10000 Num Hosts: 5           4044           4144          87          0.0       40437.9       0.1X
Coalesce Num Partitions: 10000 Num Hosts: 10           2393           2483          88          0.0       23931.6       0.2X
Coalesce Num Partitions: 10000 Num Hosts: 20           1242           1338          84          0.1       12419.6       0.4X
Coalesce Num Partitions: 10000 Num Hosts: 40            816            821           9          0.1        8158.7       0.6X
Coalesce Num Partitions: 10000 Num Hosts: 80            555            571          23          0.2        5554.2       0.9X

After patch:

Java HotSpot(TM) 64-Bit Server VM 1.8.0_112-b15 on Windows 10 10.0
Intel64 Family 6 Model 63 Stepping 2, GenuineIntel
Coalesced RDD:                            Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Coalesce Num Partitions: 100 Num Hosts: 1            394            433          37          0.3        3941.6       1.0X
Coalesce Num Partitions: 100 Num Hosts: 5            275            279           7          0.4        2748.4       1.4X
Coalesce Num Partitions: 100 Num Hosts: 10            236            241           9          0.4        2355.8       1.7X
Coalesce Num Partitions: 100 Num Hosts: 20            226            239          12          0.4        2259.1       1.7X
Coalesce Num Partitions: 100 Num Hosts: 40            220            233          14          0.5        2199.3       1.8X
Coalesce Num Partitions: 100 Num Hosts: 80            212            227          14          0.5        2120.3       1.9X
Coalesce Num Partitions: 500 Num Hosts: 1            961            976          24          0.1        9606.9       0.4X
Coalesce Num Partitions: 500 Num Hosts: 5            358            367          10          0.3        3580.5       1.1X
Coalesce Num Partitions: 500 Num Hosts: 10            288            299          19          0.3        2877.5       1.4X
Coalesce Num Partitions: 500 Num Hosts: 20            251            257           9          0.4        2508.5       1.6X
Coalesce Num Partitions: 500 Num Hosts: 40            248            252           4          0.4        2478.1       1.6X
Coalesce Num Partitions: 500 Num Hosts: 80            225            234          13          0.4        2247.3       1.8X
Coalesce Num Partitions: 1000 Num Hosts: 1           1575           1581           9          0.1       15747.8       0.3X
Coalesce Num Partitions: 1000 Num Hosts: 5            515            524          10          0.2        5154.8       0.8X
Coalesce Num Partitions: 1000 Num Hosts: 10            363            384          20          0.3        3633.5       1.1X
Coalesce Num Partitions: 1000 Num Hosts: 20            294            300           6          0.3        2943.6       1.3X
Coalesce Num Partitions: 1000 Num Hosts: 40            255            259           4          0.4        2549.3       1.5X
Coalesce Num Partitions: 1000 Num Hosts: 80            240            252          11          0.4        2398.7       1.6X
Coalesce Num Partitions: 5000 Num Hosts: 1           6904           6948          64          0.0       69038.0       0.1X
Coalesce Num Partitions: 5000 Num Hosts: 5           2070           2109          33          0.0       20702.0       0.2X
Coalesce Num Partitions: 5000 Num Hosts: 10           1136           1153          27          0.1       11362.4       0.3X
Coalesce Num Partitions: 5000 Num Hosts: 20            696            752          49          0.1        6964.3       0.6X
Coalesce Num Partitions: 5000 Num Hosts: 40            456            483          39          0.2        4555.8       0.9X
Coalesce Num Partitions: 5000 Num Hosts: 80            334            353          17          0.3        3340.8       1.2X
Coalesce Num Partitions: 10000 Num Hosts: 1          12789          12875         123          0.0      127889.3       0.0X
Coalesce Num Partitions: 10000 Num Hosts: 5           4040           4117          67          0.0       40402.9       0.1X
Coalesce Num Partitions: 10000 Num Hosts: 10           2141           2185          61          0.0       21414.0       0.2X
Coalesce Num Partitions: 10000 Num Hosts: 20           1152           1153           2          0.1       11516.1       0.3X
Coalesce Num Partitions: 10000 Num Hosts: 40            687            695          10          0.1        6869.5       0.6X
Coalesce Num Partitions: 10000 Num Hosts: 80            451            458           7          0.2        4505.0       0.9X

fitermay · 2019-03-08T12:53:14Z

@fitermay @attilapiros
I've pushed the benchmark and the suggestion of using filter, map.

From the benchmark it's seems that host=1 is the absolute worst case and this change improves it by a decent margin. It improves the other cases slightly.

attilapiros

Impressing results.

Thanks for the improvement!

fitermay · 2019-03-08T14:19:27Z

By the way. This are the results from the original PR before replacing min with minBy. It seems to be twice as fast. I'm guessing it's because of the reduction of indirection when passing an implicit ordering instead of a minBy lambda.

Intel64 Family 6 Model 63 Stepping 2, GenuineIntel
Coalesced RDD:                            Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Coalesce Num Partitions: 100 Num Hosts: 1            264            289          26          0.4        2644.9       1.0X
Coalesce Num Partitions: 100 Num Hosts: 5            211            220           8          0.5        2110.6       1.3X
Coalesce Num Partitions: 100 Num Hosts: 10            215            225          10          0.5        2149.9       1.2X
Coalesce Num Partitions: 100 Num Hosts: 20            200            203           6          0.5        1996.6       1.3X
Coalesce Num Partitions: 100 Num Hosts: 40            198            205          11          0.5        1983.7       1.3X
Coalesce Num Partitions: 100 Num Hosts: 80            199            203           4          0.5        1992.8       1.3X
Coalesce Num Partitions: 500 Num Hosts: 1            465            477          15          0.2        4654.2       0.6X
Coalesce Num Partitions: 500 Num Hosts: 5            271            280          11          0.4        2707.9       1.0X
Coalesce Num Partitions: 500 Num Hosts: 10            232            250          18          0.4        2320.5       1.1X
Coalesce Num Partitions: 500 Num Hosts: 20            213            222          14          0.5        2130.8       1.2X
Coalesce Num Partitions: 500 Num Hosts: 40            210            215           9          0.5        2102.9       1.3X
Coalesce Num Partitions: 500 Num Hosts: 80            206            206           0          0.5        2062.4       1.3X
Coalesce Num Partitions: 1000 Num Hosts: 1            715            716           1          0.1        7149.7       0.4X
Coalesce Num Partitions: 1000 Num Hosts: 5            310            311           1          0.3        3098.5       0.9X
Coalesce Num Partitions: 1000 Num Hosts: 10            255            266          17          0.4        2553.8       1.0X
Coalesce Num Partitions: 1000 Num Hosts: 20            230            238          12          0.4        2304.1       1.1X
Coalesce Num Partitions: 1000 Num Hosts: 40            227            242          21          0.4        2271.1       1.2X
Coalesce Num Partitions: 1000 Num Hosts: 80            211            217          10          0.5        2114.5       1.3X
Coalesce Num Partitions: 5000 Num Hosts: 1           3043           3616         634          0.0       30428.0       0.1X
Coalesce Num Partitions: 5000 Num Hosts: 5           1035           1069          52          0.1       10353.4       0.3X
Coalesce Num Partitions: 5000 Num Hosts: 10            613            617           3          0.2        6134.6       0.4X
Coalesce Num Partitions: 5000 Num Hosts: 20            408            419          11          0.2        4082.8       0.6X
Coalesce Num Partitions: 5000 Num Hosts: 40            315            340          24          0.3        3153.3       0.8X
Coalesce Num Partitions: 5000 Num Hosts: 80            258            262           5          0.4        2577.7       1.0X
Coalesce Num Partitions: 10000 Num Hosts: 1           5385           5470         124          0.0       53848.7       0.0X
Coalesce Num Partitions: 10000 Num Hosts: 5           1856           1861           7          0.1       18561.0       0.1X
Coalesce Num Partitions: 10000 Num Hosts: 10           1022           1075          48          0.1       10223.2       0.3X
Coalesce Num Partitions: 10000 Num Hosts: 20            619            626           8          0.2        6185.8       0.4X
Coalesce Num Partitions: 10000 Num Hosts: 40            417            422           5          0.2        4168.2       0.6X
Coalesce Num Partitions: 10000 Num Hosts: 80            312            316           4          0.3        3119.6       0.8X

srowen · 2019-03-08T18:22:40Z

Hm! That's surprising. Looking at min vs minBy, it even seems like min has more indirection (calls foldLeft). The implicit still involves calling a function to compare and get num partitions in both cases. If you're pretty sure this is accurate I'm OK returning to the implicit.

attilapiros · 2019-03-08T19:24:02Z

Can I check the reason why is this difference on Monday/Tuesday? I mean can we wait with the merge?
I am interested what causing this and would like to look into that.

dongjoon-hyun

Hi, @fitermay . Thank you for your first contribution. I saw the above good previous comments, and I left a few comments, too.

After fixing that, we can trigger Jenkins for your PR. Otherwise, it will fail to build.

dongjoon-hyun · 2019-03-08T19:40:21Z

ok to test

SparkQA · 2019-03-08T19:48:09Z

Test build #103221 has finished for PR 23986 at commit 0803c63.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

fitermay · 2019-03-09T05:22:03Z

Hm! That's surprising. Looking at min vs minBy, it even seems like min has more indirection (calls foldLeft). The implicit still involves calling a function to compare and get num partitions in both cases. If you're pretty sure this is accurate I'm OK returning to the implicit.

@srowen
There is some non-obvious indirection here. However, I believe most of the overhead is attributable to boxing/boxing. Below is the relevant bytecode that ends up being generated

Sets up the lambda that's passed into minBy. Notice that the return type of the closure must be Ljava/lang/Object. So it can't return a primitive int.

    LINENUMBER 223 L0
    ALOAD 0
    INVOKEDYNAMIC apply()Lscala/Function1; [
      // handle kind 0x6 : INVOKESTATIC
      java/lang/invoke/LambdaMetafactory.altMetafactory(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite;
      // arguments:
      (Ljava/lang/Object;)Ljava/lang/Object;, 
      // handle kind 0x6 : INVOKESTATIC
      org/apache/spark/rdd/DefaultPartitionCoalescer.$anonfun$getLeastGroupHash$3$adapted(Lorg/apache/spark/rdd/PartitionGroup;)Ljava/lang/Object;, 
      (Lorg/apache/spark/rdd/PartitionGroup;)Ljava/lang/Object;, 
      7, 
      1, 
      scala.Serializable.class, 
      1, 
      (Lorg/apache/spark/rdd/PartitionGroup;)Ljava/lang/Object;
    ]
    GETSTATIC scala/math/Ordering$Int$.MODULE$ : Lscala/math/Ordering$Int$;
    INVOKEVIRTUAL scala/collection/mutable/ArrayBuffer.minBy (Lscala/Function1;Lscala/math/Ordering;)Ljava/lang/Object;

The lambda first invokes the below function, whose only job is to box the primitive int

  // access flags 0x1019
  public final static synthetic $anonfun$getLeastGroupHash$3$adapted(Lorg/apache/spark/rdd/PartitionGroup;)Ljava/lang/Object;
    // parameter final  x$7
   L0
    LINENUMBER 223 L0
    ALOAD 0
    INVOKESTATIC org/apache/spark/rdd/DefaultPartitionCoalescer.$anonfun$getLeastGroupHash$3 (Lorg/apache/spark/rdd/PartitionGroup;)I
    INVOKESTATIC scala/runtime/BoxesRunTime.boxToInteger (I)Ljava/lang/Integer;
    ARETURN
   L1
    LOCALVARIABLE x$7 Lorg/apache/spark/rdd/PartitionGroup; L0 L1 0
    MAXSTACK = 1
    MAXLOCALS = 1

Then the actual method that returns numParititons for the comparison gets invoked.

 public final static synthetic $anonfun$getLeastGroupHash$3(Lorg/apache/spark/rdd/PartitionGroup;)I
    // parameter final  x$7
   L0
    LINENUMBER 223 L0
    ALOAD 0
    INVOKEVIRTUAL org/apache/spark/rdd/PartitionGroup.numPartitions ()I
    IRETURN
   L1
    LOCALVARIABLE x$7 Lorg/apache/spark/rdd/PartitionGroup; L0 L1 0
    MAXSTACK = 1
    MAXLOCALS = 1

- Use `min` with Ordering instead of `minBy` to avoid boxing overhead - Add benchmark results - Add benchmarks description - Don't use DebugFilesystem in benchmark - Fix scalastyle - Fix some minor existing codestyle issues in CoalescedRDD.scala

fitermay · 2019-03-10T00:30:03Z

@dongjoon-hyun @srowen
I've experimented enough today to be very confident that the reason behind the performance difference is the overhead introduced by boxing/unboxing

Pushed these changes:

Use min with Ordering instead of minBy to avoid boxing overhead
Added benchmark results
Added benchmarks description
Get rid of DebugFilesystem in benchmark
Fixed scalastyle
Fixed some minor existing codestyle issues in CoalescedRDD.scala

SparkQA · 2019-03-10T04:20:20Z

Test build #103271 has finished for PR 23986 at commit c8424af.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-03-10T08:05:02Z

Test build #103275 has finished for PR 23986 at commit 2566639.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

kiszk · 2019-03-10T13:39:33Z

retest this please

kiszk · 2019-03-10T14:23:07Z

sounds good to me

SparkQA · 2019-03-10T18:23:17Z

Test build #103278 has finished for PR 23986 at commit 2566639.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

fitermay · 2019-03-12T14:03:38Z

@dongjoon-hyun
I've applied your suggestions. Do you think you can give this another look?

srowen · 2019-03-15T01:14:02Z

Merged to master

dongjoon-hyun · 2019-03-15T21:45:08Z

+  implicit val partitionGroupOrdering: Ordering[PartitionGroup] =
+    (o1: PartitionGroup, o2: PartitionGroup) =>
+      java.lang.Integer.compare(o1.numPartitions, o2.numPartitions)
+


Hi, All.
This seems to break scala-2.11 build.

https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7-ubuntu-scala-2.11/478/

[error] ../core/src/main/scala/org/apache/spark/rdd/CoalescedRDD.scala:161: type mismatch; [error] found : (org.apache.spark.rdd.PartitionGroup, org.apache.spark.rdd.PartitionGroup) => Int [error] required: Ordering[org.apache.spark.rdd.PartitionGroup] [error] (o1: PartitionGroup, o2: PartitionGroup) => [error]

Thanks. That’s unfortunate. I’ll fix it later tonight

Then, let me revert this first to recover 2.11 build for the other PRs. Since this PR is already approved, I believe that the next PR will be easily accepted, @fitermay .

@dongjoon-hyun @srowen: Would it be a good idea to extend the PR builder to run a compile with scala 2.11 (without any test run)?

I know it is an extra 10-15 minutes but for the 4 hours test run it might be worth preventing such situations on the other hand this must be very rare. What is your opinion?

I agree with you @attilapiros . But, IIRC, there was a discussion on that issue and the decision at that time was the current cost is not high enough for that.
The committers have a responsibility to monitor their commit. And, we usually are able to do HOTFIX or revert in a short time.

Ok, thanks.
Yes it must be very rare.

We're going to drop 2.11 support soonish anyway, so I think for now we accept the occasional breaks and fix after the fact rather than double the PR builders.

dongjoon-hyun · 2019-03-15T21:57:19Z

This is reverted via 4bab69b .
Please make a PR after testing both Scala 2.11 and 2.12. Thanks!

dongjoon-hyun · 2019-03-15T22:36:06Z

Scala-2.11 build is recovered and now testing is on the way.

https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7-ubuntu-scala-2.11/482/

srowen · 2019-03-16T19:51:31Z

@fitermay I guess it has to be more explicitly constructed as an Ordering?

[error] /home/jenkins/workspace/spark-master-test-maven-hadoop-2.7-ubuntu-scala-2.11/core/src/main/scala/org/apache/spark/rdd/CoalescedRDD.scala:161: type mismatch;
[error]  found   : (org.apache.spark.rdd.PartitionGroup, org.apache.spark.rdd.PartitionGroup) => Int
[error]  required: Ordering[org.apache.spark.rdd.PartitionGroup]
[error]     (o1: PartitionGroup, o2: PartitionGroup) =>
[error]                                              ^
[info] (org.apache.spark.rdd.PartitionGroup, org.apache.spark.rdd.PartitionGroup) => Int <: Ordering[org.apache.spark.rdd.PartitionGroup]?
[info] false
[error] one error found
[error] Compile failed at Mar 14, 2019 6:17:42 PM [22.876s]

fitermay · 2019-03-16T19:59:06Z

@srowen Yes, Scala 2.11 won't do the SAM conversion to ordering.
Sent another PR tested against both Scala versions #24116

fitermay changed the title ~~SPARK-27070~~ [SPARK-27070] Fix performance bug in DefaultPartitionCoalescer Mar 6, 2019

srowen requested changes Mar 6, 2019

View reviewed changes

fitermay force-pushed the SPARK-27070 branch from db028f0 to 4ffb772 Compare March 6, 2019 14:08

srowen approved these changes Mar 6, 2019

View reviewed changes

srowen approved these changes Mar 7, 2019

View reviewed changes

attilapiros reviewed Mar 7, 2019

View reviewed changes

Comment thread core/src/main/scala/org/apache/spark/rdd/CoalescedRDD.scala Outdated

add benchmark and incorporate suggestion from review

0803c63

attilapiros approved these changes Mar 8, 2019

View reviewed changes

dongjoon-hyun reviewed Mar 8, 2019

View reviewed changes

Comment thread core/src/test/scala/org/apache/spark/rdd/CoalescedRDDBenchmark.scala Outdated

dongjoon-hyun reviewed Mar 8, 2019

View reviewed changes

Comment thread core/src/test/scala/org/apache/spark/rdd/CoalescedRDDBenchmark.scala

dongjoon-hyun reviewed Mar 8, 2019

View reviewed changes

Comment thread core/src/test/scala/org/apache/spark/rdd/CoalescedRDDBenchmark.scala

dongjoon-hyun reviewed Mar 8, 2019

View reviewed changes

Comment thread core/src/test/scala/org/apache/spark/rdd/CoalescedRDDBenchmark.scala

dongjoon-hyun requested changes Mar 8, 2019

View reviewed changes

SPARK-11316: Changes after review

c8424af

- Use `min` with Ordering instead of `minBy` to avoid boxing overhead - Add benchmark results - Add benchmarks description - Don't use DebugFilesystem in benchmark - Fix scalastyle - Fix some minor existing codestyle issues in CoalescedRDD.scala

SPARK-11316: Revert change to foreach loop

2566639

srowen approved these changes Mar 11, 2019

View reviewed changes

srowen closed this in 21db433 Mar 15, 2019

dongjoon-hyun reviewed Mar 15, 2019

View reviewed changes

Conversation

fitermay commented Mar 6, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

fitermay commented Mar 6, 2019

Uh oh!

SparkQA commented Mar 6, 2019

Uh oh!

Uh oh!

attilapiros commented Mar 7, 2019

Uh oh!

srowen commented Mar 7, 2019

Uh oh!

fitermay commented Mar 7, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fitermay commented Mar 8, 2019

Uh oh!

fitermay commented Mar 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

attilapiros left a comment

Choose a reason for hiding this comment

Uh oh!

fitermay commented Mar 8, 2019

Uh oh!

srowen commented Mar 8, 2019

Uh oh!

attilapiros commented Mar 8, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dongjoon-hyun left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Mar 8, 2019

Uh oh!

SparkQA commented Mar 8, 2019

Uh oh!

fitermay commented Mar 9, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fitermay commented Mar 10, 2019

Uh oh!

SparkQA commented Mar 10, 2019

Uh oh!

SparkQA commented Mar 10, 2019

Uh oh!

kiszk commented Mar 10, 2019

Uh oh!

kiszk commented Mar 10, 2019

Uh oh!

SparkQA commented Mar 10, 2019

Uh oh!

fitermay commented Mar 12, 2019

Uh oh!

srowen commented Mar 15, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Mar 15, 2019

Uh oh!

fitermay commented Mar 7, 2019 •

edited

Loading

fitermay commented Mar 8, 2019 •

edited

Loading

dongjoon-hyun left a comment •

edited

Loading

fitermay commented Mar 9, 2019 •

edited

Loading